Rates of Bootstrap Approximation for Eigenvalues in High-Dimensional PCA
نویسندگان
چکیده
In the context of principal components analysis (PCA), bootstrap is commonly applied to solve a variety inference problems, such as constructing confidence intervals for eigenvalues population covariance matrix $\Sigma$. However, when data are high-dimensional, there relatively few theoretical guarantees that quantify performance bootstrap. Our aim in this paper analyze how well can approximate joint distribution leading sample $\hat\Sigma$, and we establish non-asymptotic rates approximation with respect multivariate Kolmogorov metric. Under certain assumptions, show achieve dimension-free rate ${\tt{r}}(\Sigma)/\sqrt n$ up logarithmic factors, where ${\tt{r}}(\Sigma)$ effective rank $\Sigma$, $n$ size. From methodological standpoint, our work also illustrates applying transformation $\hat\Sigma$ before bootstrapping an important consideration high-dimensional settings.
منابع مشابه
Minimax Rates of Estimation for Sparse PCA in High Dimensions
We study sparse principal components analysis in the high-dimensional setting, where p (the number of variables) can be much larger than n (the number of observations). We prove optimal, non-asymptotic lower and upper bounds on the minimax estimation error for the leading eigenvector when it belongs to an lq ball for q ∈ [0, 1]. Our bounds are sharp in p and n for all q ∈ [0, 1] over a wide cla...
متن کاملRobust PCA for High-Dimensional Data
We consider the dimensionality-reduction problem for a contaminated data set in a very high dimensional space, i.e., the problem of finding a subspace approximation of observed data, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some outlying observations. We propose a High-dimension Robust Principal Component...
متن کاملInfluential Features Pca for High Dimensional Clustering
We consider a clustering problem where we observe feature vectors Xi ∈ R, i = 1, 2, . . . , n, from K possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of p n, where classical clustering methods face challenges. We propose Influential Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select...
متن کاملPCA learning for sparse high-dimensional data
– We study the performance of principal component analysis (PCA). In particular, we consider the problem of how many training pattern vectors are required to accurately represent the low-dimensional structure of the data. This problem is of particular relevance now that PCA is commonly applied to extremely high-dimensional (N 5000–30000) real data sets produced from molecular-biology research p...
متن کاملImportant Features PCA for high dimensional clustering
We consider a clustering problem where we observe feature vectors Xi ∈ R, i = 1, 2, . . . , n, from K possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of p n, where classical clustering methods face challenges. We propose Important Features PCA (IF-PCA) as a new clustering procedure. In IFPCA, we select a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Statistica Sinica
سال: 2024
ISSN: ['1017-0405', '1996-8507']
DOI: https://doi.org/10.5705/ss.202021.0158